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Abstract. When models of quantitative genetic variation are built from population ge- 
netic first principles, several assumptions are often made. One of the most important 
assumptions is that traits are controlled by many genes of small effect. This leads to 
a prediction of a Gaussian trait distribution in the population, via the Central Limit 
Theorem. Since these biological assumptions are often unknown or untrue, we charac- 
terized how finite numbers of loci or large mutational effects can impact the sampling 
distribution of a quantitative trait. To do so, we developed a neutral coalescent-based 
framework, allowing us to experiment freely with the number of loci and the underlying 
mutational model. Through both analytical theory and simulation we found the nor- 
mality assumption was highly sensitive to the details of the mutational process, with 
the greatest discrepancies arising when the number of loci was small or the mutational 
kernel was heavy-tailed. In particular, fat-tailed mutational kernels result in multimodal 
sampling distributions for any number of loci. An empirical analysis of 7079 expressed 
genes in 49 Neurospora crassa strains identified 116 genes with non-normal sampling dis- 
tributions. Several genes showed evidence of multimodality and/or skewness, suggesting 
the importance of their genetic architecture. Since selection models and robust neutral 
models may produce qualitatively similar sampling distributions, we advise extra caution 
should be taken when interpreting model-based results for poorly understood systems of 
quantitative traits. 

1. Introduction 

Questions about the distribution of traits that vary continuously in populations were 
critical in motivating early evolutionary biologists. The earliest studies of quantitative trait 
variation relied on phenomenological models, because the underlying nature of heritable 
variation was not yet well understood (Galton, 1883, 1889; Pearson, 1894, 1895). Despite 
the rediscovery of the work of Mendel (1866), researchers studying continuous variation in 
natural populations were initially skeptical that the Mendel's laws could explain what they 
observed (Weldon, 1902; Pearson, 1904). These views were reconciled when Fisher (1918) 
showed that the observations of correlation and variation between phenotypes in natural 
populations could be explained by a model in which many genes made small contributions 
to the phenotype of an individual. 

The insights of Fisher (1918) made it possible to build models of quantitative trait 
evolution from population genetic first principles. Early work focused primarily on the 
interplay between mutation and natural selection in the maintenance of quantitative genetic 
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15 variation in natural populations, while typically ignoring the effects of genetic drift (Fisher, 

16 1930; Haldane, 1954; Latter, 1960; Kimura, 1965). 

17 However, genetic drift plays an important role in shaping variation in natural popu- 

18 lations. While earlier work assumed that a finite number of alleles control quantitative 

19 genetic variation (e.g. Latter (1970)), Lande (1976) used the continuum-of-alleles model 

20 proposed by Kimura (1965) to model the impact of genetic drift on differentiation within 

21 and between populations. A key assumption of Lande's models is that the additive genetic 

22 variance in a trait is constant over time. In fact, in finite populations the genetic variance 

23 itself is random; at equilibrium, there are still stochastic fluctuations around the determin- 

24 istic value assumed by Lande, even if none of the underlying genetic architecture changes 

25 (Burger and Lande, 1994). 

26 Several later papers explored more detailed models to understand how genetic variance 

27 changes through time due to the joint effects of mutation and drift (e.g. Chakraborty 

28 and Nei (1982)). Lynch and Hill (1986) undertook an extremely thorough analysis of 

29 the evolution of neutral quantitative traits. They analyzed the moments (e.g. mean and 

30 variance) of trait distributions that arise due to mutation and genetic drift and provided 

31 several quantities that can be used to interpret variation within and between species and 

32 analyze mutation accumulation experiments. 

33 Much of this earlier work has made several simplifying assumptions about the distri- 

34 bution of mutational effects and the genetic architecture of the traits in question. For 

35 instance, Lynch and Hill (1986), despite analyzing quite general models of dominance and 

36 epistasis, ignored the impact of heavy tailed or skewed mutational effects. While, in many 

37 cases, such properties of the mutational effect distribution are not expected to have an 

38 impact if a large number of genes determine the phenotype in question, it is unknown 

39 what impact they may have when only a small number of genes determine the genetic 

40 architecture of the trait. Moreover, when mutational effects display "power-law" or "fat- 

41 tailed" behavior, the impact of the details of the mutational effects may persist even in the 

42 so-called infinitesimal limit of a large number of loci with small effects. Finally, mutation 

43 accumulation experiments have produced skewed and/or leptokurtic samples of quantita- 

44 tive traits (Mackay et al., 1992), which is a direct motivation to relax assumptions on the 

45 mutational effects distribution. 

46 Such deviations that stem from the violations of common modeling assumptions have the 

47 potential to influence our understanding of variation in natural populations. For instance, 

48 leptokurtic trait distributions may be a signal of some kind of diversifying selection (Kopp 

49 and Hermisson, 2006) but are also possible under neutrality when the number of loci 

50 governing a trait is small. Similarly, multimodal trait distributions may reflect some kind 

51 of underlying selective process (Doebeli et al., 2007) but may also be due to rare mutations 

52 of large effect. 

53 We have two main goals in this work. Primarily, we want to assess the impact of viola- 

54 tions of common assumptions on properties of the sampling distribution of a quantitative 

55 trait (e.g. variance, kurtosis, modality). Secondly, we believe that the formalism that we 

56 present here can be useful in a variety of situations in quantitative trait evolution, particu- 

57 larly in the development of robust null models for detecting selection at microevolutionary 



Downloaded from http://biorxiv.org/on September 18, 2014 



SENSITIVITY OF QUANTITATIVE TRAITS TO MUTATIONAL EFFECTS, NUMBER OF LOCI, AND POPULATION HISTORY 

58 time scales. To this end, we introduce a novel framework for computing sampling dis- 

59 tributions of quantitative traits. Our framework builds upon the coalescent approach of 

60 Whitlock (1999), but allows us to recover the full sampling distribution, instead of merely 

61 its moments. 

62 First, we outline the biological model and explain how we can compute quantities of 

63 interest using a formalism based on characteristic functions. We then use this approach to 

64 compute the sample central moments. While much previous work focuses on only the first 

65 two central moments (mean and variance), we are able to compute arbitrarily high central 

66 moments, which are related to properties such as skewness and kurtosis. By doing so, we are 

67 able to determine the regime in which the details of the mutational effect distribution are 

68 visible in a sample from a natural population. Additionally, we explore the convergence to 

69 the infinitesimal limit and find that when "fat-tailed" effects are present, traditional theory 

70 based on the assumption of normality can lead to misleading predictions about phenotypic 

71 variation. Finally, to assess the impact of genetic architecture in natural populations, we 

72 identified for non-normal sampling distributions of gene expression among 49 Neurospora 

73 crassa individuals. 

74 2. Model 

75 The mechanistic model we construct has three major components: a coalescent process, 

76 a genetic mutational process that acts upon the controlling quantitative trait loci, and 

77 a mutational kernel that samples quantitative trait effect sizes. Together these processes 

78 generate the quantitative traits sampled from the study population while explicitly mod- 

79 eling their shared genetic ancestry. Although we opt for simple model components during 
so this exposition, the model generally supports more realistic and complex extensions, such 
si as population structure and epistasis. 

82 We assume that we sample n haploid individuals from a randomly mating population 

83 of size N. Initially, consider a trait governed by a single locus and we will later extend 

84 the theory to traits governed by multiple loci. Let /i be the mutation rate per generation 

85 at the locus, and 8 = 2N[i be the coalescent-scaled mutation rate. We model mutation 

86 as a process by which a new mutant adds an independent and identically distributed 

87 random effect to the ancestral state. Note that when the distribution of random effects is 

88 continuous, this corresponds to the Kimura (1965) continuum of alleles model. However, 

89 it is also possible for the effect distribution to be discrete, similar to the discrete model of 

90 Chakraborty and Nei (1982). While this model does not capture the impact of a biallelic 

91 locus with exactly two effects, the following theory could easily be modified to analyze that 

92 case. 

93 [Figure 1 goes here] 

94 Figure 1 shows one realization of both the coalescent and mutational processes for a 

95 sample of size 5. Given the phenotype at the root of the tree and the locations and effects 

96 of each mutation on the tree, the phenotypes at the tips are determined by adding mutant 

97 effects from the root to tip. To specify the root, we can assume without loss of generality 

98 that the ancestral phenotype for the entire population has a value 0 (this is similar to 
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99 the common assumption in quantitative genetics literature that the ancestral state at each 

100 locus can be assigned a value of 0). 

101 This mutational process can be described as a compound Poisson process (see also 

102 Khaitovich et al. (2005b); Chaix et al. (2008); Landis et al. (2013) for compound Pois- 

103 son processes in a phylogenetic context). To ensure that this paper is self contained, we 

104 briefly review relevant facts about compound Poisson processes in the Appendix. 

105 In the following, we ignore the impact of non-genetic variation and focus on the breeding 

106 value of individuals, i.e. the average phenotype of an individual harboring a given set of 

107 mutations. 

108 3. Results 

109 3.1. Computing the characteristic function of a sample. In many analyses, the 
no object of interest is the joint probability of the data. If we let X = (Xi, A2, ■ ■ ■ ,X n ) be 
in the vector representing the quantitative traits observed in a sample of n individuals, we 

112 denote the joint probability of the data as p(xi, x%, . . . , x n ). Note that, in general, X{ and 

113 Xj are correlated due to shared ancestry, and that p must be computed by integrating over 

114 all mutational histories consistent with the data. Hence, computing p directly is extremely 
us difficult. 

Instead, we compute the characteristic function of X. For a one-dimensional random 
variable, X, the characteristic function is defined as K(e lkX ) where i is the imaginary unit, 
A; is a dummy variable. Generalizing this definition to an n-dimensional random variable, 
we are interested in computing 



116 where k = (fci, &2, . . . , k n ) is a vector of dummy variables. Like a probability density 

117 function, the characteristic function of X contains all the information about the distribution 
us of X. Moreover, computing moments of X is reduced to calculating derivatives of the 

119 characteristic function, which will prove useful in the following. 

120 We calculate this formula in two parts. First, we compute a recursive formula for (f>nt 

121 the characteristic function given that ancestral phenotype of the sample is equal to 0. 

122 Then, we compute p n , the characteristic function of the ancestral phenotype of the sample, 

123 assuming that the characteristic function of the population is equal to 0. As we show in the 

124 Appendix, we can then multiply these characteristic functions to obtain the characteristic 

125 function of X. 

126 We use a backward-forward argument to compute the recursive formula, first condi- 

127 tioning on the state when the first pair of lineages coalesce (backward in time) and then 

128 integrating (forward in time) to obtain the characteristic function for a sample of size n, 

129 <fi n . This results in 



A n (k) = E(e ; 



,ik T X 
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130 where h^- u ' v ^ is the vector of length n — 1 made by removing k u and k v and adding k u + k v 

131 to the vector of dummy variables. 

132 This equation has a straight-forward interpretation. The characteristic function for a 

133 sample of size n, <j) n , is simply the characteristic function for a sample of size n — 1, 

134 averaged over all possible pairs that could coalesce first, multiplied by the characteristic 

135 function for the amount of trait change that occurs more recently than the first coalescent. 

136 The multiplication comes from the fact that the characteristic function of a sum of inde- 

137 pendent random variables is the product of the characteristic functions of those random 

138 variables. We prove this result in the Appendix (Section 5.3). 

139 In the Appendix (Section 5.4), we also show that the characteristic function for the 

140 phenotype at the root of the sample is 



Pn(k) = n\(n-l)J2]J : —— 1 



v(v — 1) u\ 



, ., , , 9Mk)-l)(n + u)V 

141 Intuitively, this equation arises by conditioning on whether u lineages are left in the popu- 

142 lotion when the sample reaches its common ancestor and then averaging over the (random) 

143 time between when the individuals in the sample coalesce and when everyone in the pop- 

144 ulation coalesces. 

145 Hence, the characteristic function for a sample of size n is 

A n (k) = p n (ki + k 2 + • • • + fc n )0 n (k). 

146 3.2. Sampling traits controlled by a small number of loci. It is common practice 

147 in both theoretical and applied quantitative genetics to summarize information about the 

148 phenotypic distribution within a population by computing central moments. However, care 

149 must be taken when interpreting theoretical predictions about central moments estimated 

150 from a sample. This is because the phenotypes in the sample are not independent, but 

151 instead correlated due to their shared genealogical history. Hence, in any particular pop- 

152 ulation, an estimate of a central moment may deviate from its expected value, even as the 

153 number of individuals sampled grows to infinity (Aldous, 1985). 

With this caveat in mind, we computed the first four expected central moments for a 
sample of phenotypes taken from this model (see Appendix for details). They are 

E(h 2 ) = X -QLm 2 
E{h 3 ) = hLm 3 

E(h 4 ) = ^9 2 L 2 ml + ^6L(29m 2 2 + m 4 ), 

154 where hk is the unique minimum variance unbiased estimator of the /cth central moment, 

155 rrik is the kth moment of the mutational effect distribution and L is the number of loci 

156 that influence the trait. 
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157 These equations reveal that it may be possible to construct method-of-moments estima- 

158 tors for the moments of the mutation effect distribution and/or the number of loci that 

159 govern a trait. 

160 3.3. "Infinitesimal" limits for large numbers of loci. Many traits are assumed to be 

161 governed by a large number of loci, each individually of small effect. This is known as an 

162 infinitesimal model (Falconer and Mackay, 1996). Typically, the sampling distribution in 

163 the infinitesimal limit is assumed to be Gaussian, by appealing to the central limit theorem. 

164 Here, we find that under certain circumstances traits may not be normally distributed, even 

165 in the limit. 

To obtain a non-trivial limit, we must assume that as the number of loci controlling 
the trait increases, the effect of each individual locus decreases. Then, computing the 
characteristic function for a trait governed by a large number of independent loci is simple 
due to the fact the characteristic function of the sum of independent random variables 
is the product of their characteristic functions. Thus, assuming that each locus has the 
same effect distribution (this assumption can be relaxed relatively easily) the characteristic 
function of the limit distribution is given by 



166 In the Appendix, we show that mutation effect distributions with power law behavior 

167 instead converge to a limiting stable distribution. A random variable X is said to have a 

168 power law distribution if P(X > x) ~ nx~ a for large x, some k > 0 and some a £ [0, 2). In 

169 this limit, individuals with shared genealogy may still have highly correlated phenotypes, 

170 due to rare mutations of large effect. 

On the other hand, all mutation effect distributions without power law behavior converge 
to a Gaussian limit, due to the central limit theorem. In the Appendix, we show that 
samples taken from a population in this limit can be represented as a sample from a 
normal distribution with a random mean. In particular, 



171 where M{m, s 2 ) represents a normal distribution with mean m and variance s 2 . 

172 3.4. Simulation. We simulated data to verify our analytical results and obtain some in- 

173 sight into the nature of the stable limiting distribution that arises for power law mutational 

174 effects. We first wanted to confirm that the trait distributions converge to univariate Gauss- 

175 ian limiting distributions as n — > oo and L — > oo when mutational kernels are not fat-tailed. 

176 To explore how the moments of the sampling distribution change with respect to n, L, and 



A n (k) = lim A„(k) L 



= lim p n (k x + k 2 + ... + /c n ) L 0 n (k) L 



= Rnih + k 2 + . . . + fc n )$„(k). 




Downloaded from http://biorxiv.org/on September 18, 2014 



SENSITIVITY OF QUANTITATIVE TRAITS TO MUTATIONAL EFFECTS, NUMBER OF LOCI, AND POPULATION HISTORY 

177 the mutational kernel, we asked for which values of L do the moments of the various muta- 

178 tional kernels leave a signature in the sampled quantitative traits. Finally, we conjectured 

179 that fat-tailed mutation kernels result in trait distributions that remain multimodal as 

180 L — > oo, which we verified by simulation rather than by mathematical proof. 

181 For these simulation studies, we selected four mutational kernels: (1) the symmetric 

182 normal distribution for it's simplicity, (2) the Laplace distribution because it is heavier- 

183 tailed (or more leptokurtic) than the normal distribution yet has finite variance, (3) the 

184 skew- normal distribution for it's skewness parameter and tractability, and (4) the symmet- 

185 ric a-stable distribution because of its power-law behavior. To ensure that simulations of 

186 different non-fat-tailed distributions were comparable, we set the variance per locus to be 

187 r 2 = a 2 / L when we simulated L loci, meaning the trait distribution would have constant 

188 variance 9a 2 / 2. Note the symmetric normal distribution is a special case of both the skew- 

189 normal distribution when the skewness parameter is zero and the a-stable distribution 

190 when the "fat-tailedness" parameter is a = 2. 

191 For all simulations, we generated coalescent genealogies and mutations using the program 

192 ms (Hudson, 2002). We then generated and mapped mutational effects using custom scripts 

193 in R (R Core Team, 2013). 

194 Code is available at http://github.com/Schraiber/quant_trait_coalescent. 

195 [ Figure 2 goes here. ] 

196 [ Figure 3 goes here. ] 

197 3.4.1. Univariate Gaussian limit. For mutational kernels of small effect size and variance 

198 t 2 per locus, the sampling distribution converges to a normal distribution with variance 

199 9o~ 2 /2 where Lt 2 — > a 2 as L — > oo. We simulated 100 replicates of trait data for L € 

200 {1, 2, 4, . . . , 256} and n £ {2, 4, 8 . . . , 512} with mutation parameters 9 = 2 and r = 1 

201 for the normal, the skew- normal (skewness=0.9), and the Laplace distributions. We then 

202 assessed convergence to the normal limit using the Kolmogorov-Smirnov (KS) test statistic, 

203 D, which equals zero when two distributions are identical. Figure 2 reports the frequency 

204 we reject the null hypothesis — that the limiting and sampled distributions are identical — 

205 for each batch of 100 replicates per value of n and L for p-values less than 0.05. For 

206 n < 4, the KS test lacked power to reject the null hypothesis whatsoever. For n > 8, the 

207 three mutational kernels converge to the limiting normal distribution in a similar fashion, 

208 with the sampling and limiting distributions bearing strong resemblance when L > 16. 

209 Distinctly, the Laplace distribution converges to normality at a slower rate than other 

210 mutational kernels, likely resulting from it being leptokurtic (Figure 3). 

211 [ Figure 4 goes here. ] 

212 3.4.2. Central moments. We assessed the signature left by various mutational kernels on 

213 the sampling distribution by computing the central moments across simulation replicates. 

214 While the variance (fo) remains constant for all values of L regardless of the mutational 

215 kernel (by experimental design), the third central moment (/13) and the fourth central 

216 moment (/14) depend on the mutational kernel for small values of L. As L — > 00, the 

217 sample moments converge to those of a normal distribution. Here, we characterize the 
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218 deviation from the normally-distributed moments under a variety of mutational kernels: 

219 the (symmetric) normal distribution; the skew-normal distribution for skewnesses 0.1, 0.5 

220 and 0.9; and the Laplace distribution. We omitted the a-stable distribution from this 

221 portion of the study since its moments h%, h%, and /14 only exist when a = 2, i.e. when it 

222 is Gaussian. 

223 We simulated data while varying the number of loci, L £ {1, 2, 4, . . . , 256}, holding 

224 the sample size constant, n = 1024, for 2000 replicates for each of the five mutation 

225 kernels. Afterwards, we computed the mean /12, /13, and /14 statistics across replicates of 

226 each mutation kernel and value of L for comparison with their expected /i-statistic values 

227 (Figure 4). As expected, hi remains constant regardless of the mutation kernel or L. The 

228 normal and Laplace distributions are symmetric and produce sample /13 values near zero, 

229 indicating no skewness. The skew-normal mutation kernel result in non-zero skewness 

230 even for traits controlled by over 100 loci so long as the kernel is sufficiently strongly 

231 skewed. The speed the sampling distribution's third central moment, /13, converges to 

232 zero in inverse proportion to the magnitude of its mutational kernel's skewness value. All 

233 distributions produce non-zero /14 values when L is small, due to the randomness of the 

234 mutation process. The /14 value of the Laplace distribution, the sole leptokurtic mutational 

235 kernel in this comparison, is the slowest of all kernels to converge to the normal limit. 

236 [ Figure 5 goes here. ] 

237 3.4.3. Multimodality. As n — > 00 and L — > 00, we proved that sampling distributions 

238 generated by finite-variance mutational kernels converge to the unimodal normal distri- 

239 bution and conjectured that power-law mutational kernels, such as the a-stable, converge 

240 to multimodal stable distributions. Here, we demonstrate by simulation our proven and 

241 conjectured modality results hold as L — > 00. 

242 To do so, we test for unimodality using the dip statistic, D (Hartigan and Hartigan, 

243 1985). Briefly, D(Fq,Fi) gives the minimized maximum difference between an empirical 

244 distribution, F±, and some unimodal (null) distribution, Fq, where Fq is typically taken 

245 to be the uniform distribution. D approaches zero when F\ is unimodal and equals | 

246 when the distribution is perfectly bimodal (i.e. two point masses). We used the R package 

247 diptest (Maechler and Ringach, 2012) to compute p-values for each simulated dataset, 

248 recording the frequency of replicates whose p-value is less than 0.05 for each mutational 

249 kernel and value of L. If the limiting distribution is unimodal, we expect this frequency to 

250 be less than 0.05 as L increases. Conversely, we expect multimodal limiting distributions 

251 to converge in frequency to some value greater than 0.05 as L increases. 

252 We complemented the simulated data from Section 3.4.2 with three additional a-stable 

253 mutational kernels for a G {1.5, 1.7, 1.9}, keeping the coalescent-mutation process variance 

254 equal across all datasets. 

255 Figure 5 shows the trait distributions under a-stable mutation kernels remained mul- 

256 timodal as L — > 00 and stratify according to their respective a values: as a decreases 

257 the large-effect mutations responsible for multimodality grow more prominent. Mutation 

258 kernels of small effect size become unimodal as L — > 00. Notably, Laplace-distributed mu- 

259 tations converge to unimodality more slowly the normally-distributed mutations, echoing 
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260 the results reported in Sections 3.4.1 and 3.4.2. Also note that when the number of loci 

261 is small (L < 4) the sampling distribution is multimodal regardless of the mutation ker- 

262 nel. This corroborates our earlier KS tests (Section 3.4.1), which found simulated data for 

263 L < 4 bore little to no resemblance to a unimodal normal distribution. 

264 3.5. Neurospora crassa gene expression. Our simulations show that skewness, leptokur- 

265 tosis, and multimodality may surface in the sampling distribution of quantitative traits, so 

266 we searched for these patterns in the N. crassa gene expression data reported by Ellison 

267 et al. (2011). Based on our modeling framework, the details of the deviation from normality 

268 can be used to infer the characteristics of the underlying mutational kernel. We selected this 

269 dataset because the data were collected so as to minimize environmental effects, because 

270 many changes gene expression may only be weakly deleterious if not neutral, and because 

271 transcriptomes contain thousands of comparable and consistently measurable quantitative 

272 traits. For these analyses, we only look at properties of the sampling distribution and make 

273 no assumptions about the generating process. 

274 Forty-eight individuals were sampled from a wild population in Louisiana. The samples 

275 were then propagated in a controlled laboratory setting to minimize environmental and 

276 genotype-by-environmental effects on the quantitative traits. RNAseq raw read counts 

277 were obtained for 9793 genes, then normalized using upper quartile normalization (raw 

278 read counts divided by transcript length, further divided by the third quartile of all ranked 

279 read counts per individual). All expression levels were log-transformed. To control for 

280 noise, we discarded any weakly expressed gene with values less than log(-5.0) for any of the 

281 48 individuals, then additionally discarded the most extreme value per gene, yielding 7079 

282 genes with 47 individuals per gene. We expect these noise-control filters to bias our trait 

283 distribution toward normality, in particular, towards unimodal symmetric distributions 

284 with no excess kurtosis. 

285 [ Figure 6 goes here. ] 

286 We used the Shapiro- Wilk test to assess normality and the dip test to assess multimodal- 

287 ity Additionally, we computed the sample skewness and kurtosis. Twenty-five and 697 

288 genes reported p-values less than 0.05 for the dip test and Shapiro- Wilk tests, respectively, 

289 1 4 of which fell into both categories (Figure 6A). We saw that genes in which normality is 

290 rejected tend to be positively and negatively skewed with approximately the same frequency 

291 (Figure 6B), and are more often leptokurtic than platykurtic (Figure 6C) For genes where 

292 we failed to reject normality, the mean sample skewness is -0.016 (i.e. mildly negatively 

293 skewed) and mean sample kurtosis is 2.78 (i.e. mildly platykurtic). After correcting for a 

294 false discovery rate of 10%, we identified one multimodal trait and 116 non-normally dis- 

295 tributed traits. Among these discoveries, 57 were negatively skewed while 59 were positively 

296 skewed, and 23 were platykurtic while 93 were leptokurtic. Figure 7 shows the sampled 

297 trait distributions for six of the 116 non- normally distributed genes. The complete list of 

298 116 genes is available at https://github.com/Schraiber/quant_trait_coalescent. 

299 [ Figure 7 goes here. ] 
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300 The Shapiro- Wilk and dip tests may have insufficient power to reliably reject normality 

301 and unimodality hypotheses when given only 48 samples (49 samples minus the one most 

302 extreme- valued sample). We assessed power by simulation in a few simple cases. For 

303 example, for 48 samples drawn from an a-stable with a = 1.8, approximately 41% rejected 

304 normality under the Shapiro- Wilk test. Similarly, for a skew- normal distribution with 

305 skewness 0.5, 23% rejected normality. A simple bimodial distribution with equal mixture 

306 weights, means (0,4), and standard deviations (1, 1) rejected the dip test 43% of the time. 

307 When one mixture component outweighs the second component four-fold (i.e. when a 

308 minor clade in the population carries a mutation of large effect), the dip-test is rejected 

309 only 0.71% of the time. 



310 4. Discussion 

311 The natural world is replete with quantitative trait variation and understanding the 

312 forces governing their evolution is a central goal of evolutionary biology. The model of 

313 Fisher (1918), which explained how quantitative variation can be generated by Mendelian 

314 inheritance, provides an underpinning for understanding the generation and maintenance of 

315 variation in continuous characters. A primary assumption of much of this work is traits are 

316 controlled by a large number of loci and that new mutations have a very small, symmetric 

317 effect on the trait value. 

318 In this work, we introduced a coalescent framework for modeling neutral evolution in 

319 quantitative traits. This stands in contrast to past work, which has typically taken a 

320 forward-in-time approach based on classical population genetics (but see Whitlock (1999) 

321 who also utilized a coalescent model). Our backward-in-time, sample- focused approach 

322 enabled us to derived an expression for the joint distribution of the data with arbitrary 

323 mutational effects and numbers of loci. We found that traits governed by a large number 

324 of loci with small effects are well-modeled by a Gaussian distribution, as expected. How- 

325 ever, we saw that with small numbers of loci, significant departures from normality can 

326 be observed. Moreover, for fat-tailed (or power-law) mutational kernels, there are signifi- 

327 cant departures from normality (including multi-modality) , even when the number of loci 

328 becomes large. 

329 We assessed departure from normality in traits governed by a small number of loci by 

330 exploring the central moments of three different mutational kernels (normal, skew-normal 

331 and Laplace distributions) both analytically and by simulation. We showed that although 

332 all three mutational kernels converge to a Gaussian distribution, traits controlled by a 

333 small number of loci retain the signature of their underlying mutational kernel in their 

334 3rd and 4th central moments. Hence, it may be possible to reconstruct aspects of the 

335 mutational effect distribution by observing phenotypes in natural populations. This may 

336 be particularly interesting for analyzing variation in gene expression, because mutational 

337 effects in cis may be strongly skewed (Khaitovich et al., 2005a; Chaix et al., 2008; Gruber 

338 et al., 2012). Our theory suggests that the distribution of gene expression in a population 

339 might therefore be skewed. 
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340 We were also interested in the circumstances under which multi-modal phenotypic dis- 

341 tributions can arise. When a trait has a simple genetic architecture, it's easy to see that 

342 there must be discrete phenotypic clusters, corresponding to groups of individuals sharing 

343 the same mutations. As the number of loci increases, there are more mutational targets 

344 (and thus more mutation events), which smooths the distribution, causing the sampling 

345 distribution to converge to the appropriate limiting distribution. For mutational effects 

346 with finite variance, this ultimately results in a limiting Gaussian distribution, consistent 

347 with the central limit theorem. However, when the mutational kernel is fat-tailed, the 

348 marginal effects of each locus do not vanish as the number of loci grows. Thus, some 

349 clade-specific mutations will always be of large effect despite the number of loci assumed 

350 by the model, resulting in a multi-modal sampling distribution. 

351 Following our simulations, we asked whether the sampling distributions for empirical 

352 quantitative trait data were testably non-normal, as might be generated by the neutral 

353 model we presented earlier. For a sample of 49 strains of N. crassa, we found 116 of 7079 

354 genes were detectably non-normal with a false discovery rate of 10%. Qualitatively, many 

355 more than 116 genes appeared non- normal, but more than 49 samples are needed for suffi- 

356 cient testing power. Because the data were generated while controlling for environmental 

357 effects then filtered for noisy measurements, we expect the quantitative trait variation must 

358 be explained predominantly by genetic factors. Several genes showed evidence of skewness, 

359 which our model shows could result from a skewed mutant effect distribution. Similarly, 

360 many genes showed evidence for leptokurtosis, although this may be due to either a simple 

361 genetic architecture (i.e. the trait is controlled by few loci) or a truly leptokurtic mutational 

362 kernel. The neutral model we proposed describes gene expression evolution only when no 

363 selection is acting on the quantitative trait, though we suspect that many of our results will 

364 hold qualitatively under weak selection. Of course, the assumption of weak or no selection 

365 is unlikely to be true across the entire transcriptome. Nonetheless, we believe that these 

366 qualitative aspects of the data can be used to shed light on the underlying mutational 

367 processes governing quantitative trait evolution. Whatever genetic process generated these 

368 data, e.g. an adaptive model, an adequate model must be capable of explaining skewed, 

369 leptokurtic, and multimodal sampling distributions. 

370 These results show that even under the assumption of neutrality, significant departures 

371 from normality are possible and can be detected in empirical data. It is possible that these 

372 deviations from normality may be conflated with signatures of selection acting on quan- 

373 titative variation. Several recent studies have claimed that evidence of non-Gaussianity 

374 may be evidence for non-neutral evolution at macroevolutionary time scales. For instance, 

375 Khaitovich et al. (2005a); Chaix et al. (2008) found that the distribution of gene expression 

376 differences between great apes is strongly positively skewed. Similarly, Uyeda et al. (2011) 

377 argued that there is a one million year wait between bursts of evolution in the fossil record 

378 and numerous studies have explored non-Gaussian trait divergence in a phylogenetic con- 

379 text (Landis et al., 2013; Eastman et al., 2013). While it is unlikely that the population 

380 genetic model we developed can be directly applied to macroevolutionary data of this sort 

381 (Estes and Arnold, 2007), it is important to recognize that such effects can be due to purely 

382 neutral processes. 
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383 On shorter time scales, there is significant interest in detecting non-neutral quantitative 

384 trait evolution among closely related species or populations. One powerful method com- 

385 pares a measure of quantitative trait divergence, Q s t, to the fixation index, F s t (McKay and 

386 Latta, 2002; Ovaskainen et al., 2011). However, this requires estimates of breeding values 

387 from common-garden experiments, and may be difficult to achieve. In other cases (e.g. 

388 Lemos et al. (2005)) more phenomenological approaches are taken, by comparing within 

389 and between species phenotypic diversity. The null distributions of these approaches typi- 

390 cally rely on assumptions of the infinitesimal model, which we have shown may be violated 

391 due to mutations of large effect and/or loci with relatively simple genetic bases. To address 

392 these issues and leverage the abundance of modern quantitative trait data, Berg and Coop 

393 (2014) developed a method that explicitly uses breeding values estimated from quantitative 

394 trait mapping studies. When such effect size estimates are unavailable, it may be possible 

395 to use our formalism to develop robust null models to detect selection. 

396 Our coalescent approach can be extended in several ways. Notably, we consider only 

397 haploid populations. In principle, an extension to diploid individuals is straight-forward 

398 using the result of Mohle (1998) that diploid, dioecious populations of size N are readily 

399 modeled by pairing random chromosomes from a haploid population of size 2N. To incor- 

400 porate diploidy, we would also need to incorporate a model of dominance, of which several 

401 exist in the literature (e.g. the model of independent dominance of Lynch and Hill (1986). 

402 From the point of view of the coalescent process, it is straightforward to apply our model 

403 to populations that have undergone complex demographic histories. This is because the 

404 dynamics of a coalescent under population size fluctuations and population structure are 

405 well known. Moreover, we explored only unlinked, neutral loci and it may be possible 

406 to obtain some analytical results for linked loci and/or weak natural selection by using 

407 the ancestral recombination graph and ancestral selection graph, respectively. While an- 

408 alytical results are difficult within these frameworks, we believe that they can be used 

409 to perform simple simulations of quantitative traits evolving in complex scenarios, thus 

410 enabling Approximate Bayesian Computation. 

411 5. Appendix 

412 5.1. Compound Poisson processes. To obtain the probability of the data under this 

413 model, we must be able to compute the probability of the change in phenotype along a 

414 branch of the tree. Unfortunately, except for very simple mutational models, this probabil- 

415 ity is impossible to compute analytically. Instead, we compute the characteristic function 

416 of the change along a branch. 

417 Using standard results for compound Poisson processes (Kingman, 1992), we see that 

418 the characteristic function of the change along a branch of length t (in coalescent units) is 

(3) <Pt(k) = eiW)- 1 ) 

419 where ip is the characteristic function of the mutational effect distribution. 
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5.2. The phenotype at the root of the sample genealogy and the subsequent 
evolution within the sample are subindepenent. Note that 

A n (k)=E(e lkTx ) 

= ^,( e i ( k iC R -+£i)+k2(K+S 2 ) + ...+k n {K+£ n ))^ 

where 1Z is the phenotype at the root of the sample genealogy and E u is the subsequent 
evolution leading to lineage u in the sample. So, 

E ^ e i(k 1 {n+£ 1 )+k 2 (TZ+£2) + -+k n ('R.+S n ))^ _ ^^ e i{k 1 (K+£ 1 )+k2(K+£2)+...+k n (K+£ n ))^^ 

— ^^ e i(ki+k2+...+k n )K)^^ e i(k 1 £ 1 +k 2 £2+---+k n £ n )^ 

420 where the last line follows by independent and stationary increments of the compound 

421 Poisson process. Thus, 1Z and (<?i,<?2, • • • ,£ n ) subindependent, and hence their joint char- 

422 acteristic function is the product of their characteristic functions. 

5.3. Proof of recursive formula for the characteristic function. First, we condition 
on the state at the first coalescence (going back in time). The state consists of three 
components: 1) which pair of individuals coalesce, (u,v), 2) the time of the coalescent 
event, T c , and 3) the trait value in each lineage at that time, X' (note that, given (u,v), 
we have that X' u = X' v , since those two ilneages have coalesced and hence had the same 
trait value at the time of coalescence). Then, 

E(e- kTx ) = E ( „),x.,r t (e (e* Tx \(u, v), X', T,_ 

= ^T)E^(E(, ikTx l(M),x',r, 

= t^E^.t, (E (e^ T ( x W»|(n,,),X',T c 
n(n — 1 ^-^ V V 

(4) = *^ Y.^r. {e'"^ {e^^K)) 

423 where Y(t) = (Yi (t) , Y2 (t) , . . . ,Y n (t)) is the vector accounting for the evolution on each 

424 lineage that occurs during time t. The second line follows by the fact that each pair 

425 is equally likely to coalesce (with probability (2) ) and the third line by independent 

426 increments of a compound Poisson process. 

427 Now, we compute the internal expectation going forward in time. Noticing that E(e* k Y ( Tc ) \T C 

428 is simply the characteristic function of a compound Poisson process run for length T c , we 

429 see from (3) that 



E(e^|T c )=exp T c 




i=l 
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430 Because T c and X' are independent, we can integrate over T c analytically in the outer 

431 expectation. The distribution of the time to the first coalescent event in a sample of size 

432 n is Exponential with rate Q) , hence, 

(ex P { \T. [± «*, - n) }) = n(n _ t) _ff^%_ B) . 

433 Plugging this result into (4) results in 

E ( elkTX ) = ~7 n — WT^ 1 — Uk\ \12 E x' K*') > 

V J n(n-l)-e{}Z i=1 ip(ki)-n)^ v V J 

434 but since X' is simply the result of the same process where two of the entries are identical, 

435 we obtain the recursive formula (1). 

436 To initialize the recursion, we must compute the characteristic function for a sample of 

437 size 2. This is 

(5) 0 2 (k) = 



6^(k 1 ) + iP(k 2 )-2)- 



438 5.4. The phenotype at the root of the sample genealogy. First, we note that, 

439 conditional on the time between when the sample genealogy finds a common ancestral and 

440 the population genealogy finds a common ancestor, A, the characteristic function of the 

441 phenotype at the root of the sample genealogy is 

442 by using equation (3). Thus, the after integrating over A, the desired quantity is moment 

443 generating function of A, defined by 

M(z) = E(e zA ) 

444 evaluated as z = ^(^(k) — 1). 

445 We compute M{z) by conditioning on how many lineages are left in the population 

446 genealogy when the sample reaches its most recent common ancestor. To do this, we make 

447 use of a result of Saunders et al. (1984), 

n\ 

P(u lineages left in population! sample coalesced) = n!(n — 1)7 -r. 

(n + u)\ 

Given that u lineages are left in the population when the sample reaches its most recent 
common ancestor, the remaining time until the whole population reaches its common an- 
cestor is simply the time it takes for a coalescent started with u to reach its most recent 
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common ancestor, C u . Thus, 

M{z) = E(e zA ) 

= E u (E(e zA \u lineages left when sample coalesces)) 

oo I 

= J2 E (e zCu )nl(n-l) 



u=l 

oo 



(n + u)\ 



2^ll v ( v _l)_2z 1 } {n + u)\ 

u=lv=2 K ' \ • / 

448 where the final line follows by recognizing that C u is the sum of u — 1 independent ex- 

449 ponential random variables with means Q), O^ 1 )' • • ■ > (2)' Substituting ^(ip(k) — 1) for z 

450 yields the desired result. 

5.5. Computing sample central moments. While it is difficult to compute the expec- 
tation of any sample central moments for a particular sample, it is possible to average over 
replicate populations to compute expectations. This results in 

E(h 2 ) = E(Xf) - E(X X X 2 ) 

E(h 3 ) = E(Xf) + 2E{X 1 X 2 Xz) - 3E(XfX 2 ) 

E(h 4 ) = E(Xf) + 6E(XfX 2 X 3 ) - AE{XfX 2 ) - ZEiX^X^), 

451 where the expectations on the right hand sides are over the correlated phenotypes in the 

452 sample. It is possible to compute these expectations by taking derivatives of the char- 

453 acteristic function (1). After simplifying, one then arrives at the formulas in the main 

454 text. 

455 5.6. Derivation of multivariate stable limit for sample distribution. Recall that a 

456 random variable X is said to have a fat-tailed (or power-law) distribution if 

(6) P(X > x) ~ Kx~ a 

457 for large x and some k > 0. As is typical in the literature, we reserve the term "fat-tailed" 

458 for distributions with a G (0, 2). 

459 To obtain an appropriate scaling limit, we assume that there is a parameter t, related 

460 to the parameter k in (6) by 



(7) t 



TT 



sin(a7r/2)r(a)a' 

461 such that Lt — > s as n — > 00. The parameter s is related to the scale parameter of the 

462 resulting limit distribution. 

463 We provide a heuristic derivation, rather than a rigorous proof. First, we argue by 

464 induction that the (per locus) characteristic function for a sample of size n is 



,(k) ~ 1 




26j 



Downloaded from http://biorxiv.org/on September 18, 2014 



Hi 



JOSHUA G. SCHRAIBER AND MICHAEL J. LANDIS 



465 for large L, where V* (k) is the power set of the elements in k, except the set {ki , &2, • • ■ , k n }, 

466 and Cn \j\ is a combinatorial constant that depends only on the sample size n and |j|, the 

467 size of the set j. 

Note that for n = 2, this can be seen by observing that for large L, the characteristic 
function of a fat-tailed distribution is asymptotically 

s ,, , 



L 



Thus, 



ip(k) ~ 1 - 

i(k)~l-~(|kr + |fca| 



Now, assume that the formula holds for <j) n —\. Using the recursion (1), we have 

1 



6n(k) 



(2) ~~ T (SjeP*(k) S^ej z 



^^n-l(k^ 



(2) + 2T (E"=l M") 

The second line follows from plugging if) and (j), and c^ui arises by summing over the 
appropriate terms coming from all characteristic functions in the sum. Again looking for 
an asymptotic for large L, we see that 



(2) - S £ je p 



*(k) 



2L 



(E M =i l^ul 




E 



u=l 



Y c ™.ui 




\j67>*(k) 





= 0n(k). 

468 Finally, we note that by raising <^ n to the TLth power, and taking the limit as L 

469 we obtain the log characteristic function 



00, 



Y c ™.ui 




") 


\jeP*(k) 







(8) 



470 where all terms are defined as before. 

471 The characteristic function in (8) can be recognized to be that of a multivariate a-stable 

472 distribution (Press, 1972). These multivariate distributions are fat-tailed generalizations 
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473 of the familiar multivariate normal distribution, and this limit corresponds to a generalized 

474 multivariate central limit theorem for sums of random vectors with fat-tailed distributions. 

475 5.7. Limiting distribution of the phenotype at the root of the sample genealogy. 

476 Again, we proceed heuristically rather than rigorously. First, note that for large L, 

v(v-i) ^ o* 



v(v - 1) - 0(ifj(k) - 1) Lv(v-iy 



477 so that 



n 

V=2 



Lvv-l) 1 1 u L 



for large L. Thus, 



OG 



Pn (k) ~ n\(n - 1) £ ( i - — f w ) t-^it 

^-^ \ u L I (n + it)! 

U=l X / \ f 

So by definition of the exponential function, we have that 

R n (k) = lim p n (k) L 

L— >oo 

(9) =e-"l fc l Q . 

478 which the characteristic function of a univariate a-stable distribution, arising from the fact 

479 that the phenotype at the root of the sample genealogy is itself a limit of a sum of random 

480 variables. Note that as n — > oo (i.e. the sample becomes the whole population), R(k) — > 1, 

481 because the root of the sample genealogy is the same as the root of the population genealogy 

482 and the root value has been specified to be equal to 0. 

483 5.8. Multivariate Gaussian limits. For the case where the mutation distribution is not 

484 fat-tailed, we can use the multivariate central limit theorem to more efficiently derive the 

485 limiting distribution. The appropriate scaling in this case is to assume that if r 2 is the 

486 variance of the mutation effect kernel, then Lt 2 — > a 2 as L — > oo. 

487 To apply the multivariate central limit theorem, we must derive the pairwise covariances 

488 between samples. While the required covariances could be computed by taking derivatives 

489 of the characteristic function, it is more instructive to compute these moments directly. 

490 For simplicity, we assume that the mutation effect distribution has mean 0 and variance 

9 

491 T . 

492 Assume that the population genealogy at a single locus, Q, is fixed. Noting that the 

493 variance per unit time accrued by the mutational process is 9/2t 2 and using the rules for 

494 calculating covariance structure on a phylogeny, it's easy to see that for samples i and j 

495 we have 

(0 t 2 T if i = j 



Cov(Xi,xAg) 
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496 where T is the height of Q and and Ty is the height of the most recent common ancestor 

497 of samples i and j. We can then use the law of total covariance, 

Cov(Xi,Xj) = E(Cov(X i ,X J |g)) + Cov(E(X,|g),E(A^)) 

498 to see that 

„ , v v . J 9t 2 if i = j 
Cov(Xi,Xi) = < , „ 

V ° } \\9r 2 iii^j. 

499 This arises because E(T) = 2 and E(Ty) = 1. 

500 Hence, as the number of loci increases to infinity in such a way that Lt 2 — > a 2 , the 

501 sampling distribution converges to a multivariate normal distribution with mean 0 and 

502 variance covariance matrix E having elements 

Sij = 

503 Because the pairwise covariances are equal, the random vector X is an exchangeable Gauss- 

504 ian random vector. Hence, using well-known facts about the representation of exchangeable 

505 Gaussian random vectors, one arrives at the representation in the main text. 
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7. Figures 
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Figure 1. Example realization of coalescent process for a sample of size 
5. Mutations (marked as light gray X's), are placed upon the genealogy 
representing each individual in the population. Effects of each mutation 
are drawn from a probability distribution and are added along each branch 
length. The model is specified such that the most recent common ancestor 
(MRCA) of the population has phenotype 0.0, while the MRCA of the 
population may have a phenotype different from zero, due to mutations 
that accumulate between the MRCA of the sample and the MRCA of the 
population. 
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FIGURE 2. Frequency to reject normal sampling distribution. Heatmap 
cells correspond to number of sampled individuals, n, and number of loci, 
L. Panels are labeled with their respective mutational kernels. For 100 
simulated replicates per cell, heatmap values correspond to the frequency 
the Kolmogorov-Smirnov test rejects the null hypothesis {p < 0.05) that the 
sampling distribution and the limiting normal distribution are equal. White 
cells indicate the sampling distribution looks normal. Black cells indicate 
the sampling distribution does not look normal. 
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Figure 3. Comparison of frequency to reject normal sampling distribution. 
Heatmap cells correspond to number of sampled individuals, n, and num- 
ber of loci, L. Panels are labeled with their respective mutational kernels. 
For 100 simulated replicates per cell, heatmap values correspond to the fre- 
quency the Kolmogorov-Smirnov test rejects the null hypothesis (p < 0.05) 
that the sampling distribution, A, and the limiting normal distribution are 
equal minus the frequencies computed for a second distribution, B. White 
cells indicate both distributions report equal frequencies. Black cells indi- 
cate A looks normal more often than B. Red cells indicate A looks normal 
less often than B. 



Downloaded from http://biorxiv.org/on September 18, 2014 



JOSHUA G. SCHRAIBER AND MICHAEL J. LANDIS 




Figure 4. Central moments. From left to right, the panels correspond to 
the central moments, /12, /13, and /14, respectively, for the sampling distri- 
butions evolving under various mutational kernels. Data were simulated 
for 1024 sampled individuals and 2000 replicates for eight values of L, the 
number of loci. Colors distinguish the mutational kernel and relevant ker- 
nel parameters (if any). Solid lines correspond to moment values computed 
from the simulated data. Dashed lines correspond to the expected moment 
values. 




Figure 5. Frequency to reject unimodal sampling distribution. Solid lines 
report the frequency the null hypothesis of the dip-test, that the sampling 
distribution was unimodal, was rejected for p < 0.05 when evolving under 
various mutational kernels. Data were simulated for 1024 sampled individ- 
uals and 2000 replicates for eight values of L, the number of loci. Colors 
distinguish the mutational kernel and relevant kernel parameters (if any). 
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FIGURE 6. N. crassa gene expression trait distributions. All panels share a 
common y-axis, the log of the Shapiro- Wilk test's p- value, where values less 
than log(0.05) are below the dashed horizontal line, indicating the rejection 
of normality. A) The log of the dip test's p- value, where values less than 
log(0.05) are to the left of the dashed vertical line, meaning the rejection 
of unimodality. B) Skewness, where the dotted line shows the expected 
skewness under normality, 0. C) Kurtosis, where the dotted line shows the 
expected kurtosis under normality, 3. 
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Figure 7. Examples of non- normal gene expression trait distributions. 
One hundred sixteen gene expression trait distributions (of 7079 genes) 
were found to be significantly non-normal under the Shapiro- Wilk test with 
a false discovery rate of a = 0.1. Of the 116 genes, six genes (NCU08194, 
NCU09073, NCU05196, NCU01838, NCU08193, NCU09243) that strongly 
rejected normality were chosen to represent the variety of skewed, leptokur- 
tic, and multimodal distributions sampled for many genes. 



